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Abstract. Previous approaches to robustness in natural language pro- 
cessing usually treat deviant input by relaxing grammatical constraints 
whenever a successful analysis cannot be provided by "normal" means. 
This schema implies, that error detection always comes prior to error 
handling, a behaviour which hardly can compete with its human model, 
where many erroneous situations are treated without even noticing them. 
The paper analyses the necessary preconditions for achieving a higher 
degree of robustness in natural language processing and suggests a quite 
different approach based on a procedure for structural disambiguation. 
It not only offers the possibility to cope with robustness issues in a more 
natural way but eventually might be suited to accommodate quite dif- 
ferent aspects of robust behaviour within a single framework. 

1 Robustness in Natural Language Processing 

The notion of robustness in natural language processing is a rather broad one 
and lacks a precise definition. Usually, it is taken to describe a kind of mono- 
tonic behaviour, which should be guaranteed whenever a system is exposed to 
some sort of non-standard input data: A comparatively small deviation from a 
predefined ideal should lead to no or only minor disturbances in the system's 
response, whereas a total failure might only be accepted for sufficiently distorted 
input. 

Under this informal notion robustness may well be interpreted as a system's 
indifference to a wide range of external disruptive factors including 

— the inherent uncertainty of real world input, e.g. speech or hand writing, 

— noisy environments, 

— the variance between speakers, for instance idiolectal, dialectal or sociolectal, 

— "erroneous" input with respect to some normative standard, 

— an insufficient competence of the processing system, if e.g. exposed to a 
non-native language or new terminology, 

— highly varying speech rates and 

— resource limitations due to the parallel execution of several mental activities. 

One of the most impressive features of human language processing is the 
ability to retain its basic capabilities even if it is exposed to a combination of 
adverse factors. Technical solutions, on the other hand, are likely to have serious 
problems if confronted with only a single type of distortion, apart from the 
fundamental difficulties to supply the desired monotonic behaviour at all. 



Accordingly, problems of robustness in NLP have almost never been consid- 
ered from a unifying perspective so far. A number of very specific techniques for 
some of those different aspects has been developed, which hardly can be related 
to each other. 

Robustness, for instance, is a key issue in speech recognition, where reliable 
recognition results for a variety of speakers and speaking conditions are desired. 
Two basic technologies attempt to support this goal 

— robust stochastic modelling techniques which are able to capture generaliza- 
tions across the individual varietya and 

— sophisticated search procedures which select among huge amounts of compet- 
ing recognition hypotheses by comparing probability estimations for signal 
segments of increasing length. 

Special signal enhancement techniques are used to suppress stationary environ- 
mental noise. There are other aspects of robustness which even have not been 
treated at all, including the flexible adaptation to external time constraints or 
internal resource limitations. 

Traditionally the notion of- robustness has been strongly connected to the 
processing of ill-formed inputEl, where ill-formedness can be defined both, in 
terms of human standards of grammaticality or in terms of unexpected input. 
Most of the work has been concerned with the problem from a purely syntactic 
point of view and usually relied on two basic techniques: error anticipation and 
constraint relaxation. 

Error anticipation identifies a number of common mistakes and tries to in- 
tegrate them into the existing grammar by devising dedicated extensions to its 
coverage. Therefore, the method is limited to a few selected types of deviant 
constructions which are notorious and therefore predictable, namely 

— stereotypical spelling mistakes [*comittee, *rigth, etc.), 

— performance phenomena in spoken language, like restarts (cf. ||) and 

— interference-based competence errors in early phases of second language 
learning (cf. Jl|). 

Obviously, the complete "innovative" potential and the individual creativity for 
producing ill-formed input cannot be adequately captured by such means alone. 

On the other hand, constraint relaxation techniques rely on a systematic vari- 
ation of existing grammar rules written for standard input. Initially, the idea 
was restricted to the stepwise retraction of e.g. agreement conditions in syntac- 
tic rules. It can easily be extended to incorporate arbitrary rule transformations 
in order to allow for the insertion, deletion, substitution and transposition of 

1 The difficulties with a straightforward generalization of this approach to e.g. syntactic 
or semantic anomalies are obvious: It would require huge amounts of sufficiently de- 
viant utterances being available as training data. This renders the approach technically 
infeasible and cognitively implausible. 

For similar reasons connectionist approaches are not considered here: At the moment 
they seem to be limited to approximate solutions for fiat representations (cf. p7|), 

2 For a good overview see p5|. 



elements. The difference vanishes completely within modern constraint-based 
formalisms (2f| where a transposition of constituents can be interpreted 
equally well as a relaxation of linear precedence constraints. Furthermore, con- 
straints can be annotated by their degree of vulnerability, hence allowing to 
include aspects of error anticipation into the relaxation framework. 

Since both, error anticipation and constraint relaxation considerably enlarge 
the generative capacity of the original grammar they will lead to spurious am- 
biguities and serious search problems. This restricts their application to a kind 
of post mortem analysis! Only if a failure of the standard analysis procedure 
indicates the presence of non-standard input, error rules or relaxation techniques 
are activated to integrate the fragmentary results obtained so far. 

Even a superficial comparison with human processing principles shows the 
fundamental deficit of these approaches. A human reader or listener accepts ill- 
formed input to a wide degree, often without noticing an error at all. This is 
particularly true if strong expectations concerning the content of the utterance 
are involved or if heavy time constraints restrict the processing depth. 

Obviously, there is a fundamental parallelism between robustness issues and 
time considerations, which syntactically oriented solutions lack so far. Robust- 
ness in human language processing does not amount to an additional effort, 
but instead facilitates both, insensitivity to ill-formed input as well as a flexible 
adaptation to temporal restrictions. 

This basic pattern is much better modelled by semantically oriented ap- 
proaches based on the slot-and-filler-principle. Here, highly domain specific ex- 
pectations are coded by means of frame-like structures and checked against the 
input for satisfaction. The schema can be successfully extended to a kind of skim- 
ming understanding bringing together the question of robustness against syn- 
tactically ill-formed input and some simple considerations concerning resource 
limitations. 

This advantage of a semantically guided analysis, however, is won by the 
cost of excluding another important robustness feature, namely the ability to 
cope with unexpected input (e.g. a change of topic beyond the narrow limita- 
tions of the domain or the violation of selectional restrictions in metaphorical 
expressions). 



2 Observations from Human Language Processing 

Psycholinguistic evidence provides a contradictory picture of human language 
processing. Some observations clearly support a rather strong modular organi- 
zation with processing units of great autonomy like syntax and semantics Q 
[^). On the other hand there is a considerable semantic influence on the assign- 
ment of syntactic structure [^(J which suggests a highly integrated processing 
architecture. 

3 There are exceptions to every rule: For language learning purposes propose an 
initial analysis based on a moderately weak grammar and followed by a more rigid 
second pass. 



Robust behaviour in natural language understanding seems to require both, 



— the autonomy between parallel lines of processing which embodies redun- 
dancy and allows to compensate partial insufficiencies and 

— the interactive nature of informational exchange which allows to relate par- 
tial structures on different levels of granularity. 

Functional autonomy undoubtedly is of fundamental importance for robustness. 
It allows to yield an at least vague interpretation even in cases of extremely 
distorted input: 

1. A semantically almost empty sentence can be analysed quite well by syn- 
tactic means alone, delivering a hypothetical interpretation in terms of a 
possible world with highly underspecified referential object descriptions and 
possibly ambiguous thematic roles. 

"... und grausig gutzt der Golz. "p3|] (1) 

2. Syntactically ill-formed utterances are interpreted based on semantic and 
background knowledge even if subcategorization regularities or other gram- 
matical constraints are violated. 

Although both processing units are - at least partially - able to generate some 
useful interpretation independently of the other one, best results, of course, are 
to be expected if they combine their efforts in a systematic way. 

Parallel and autonomous structures in language processing have not only 
evolved between syntactic and semantic aspects of language. They can be ob- 
served equally well at the level of speech comprehension where auditory (hearing) 
and visual (lip-reading) clues are usually combined to achieve a reliable recogni- 
tion result. Again, both systems - in principle - are able to work independently, 
but synergy occurs if both are activated concurrently. 

A second group of observations related to the question of robustness concerns 
the expectation- driven nature of human language understanding. Here, expecta- 
tions come to play at two different dimensions: 

— Syntactic, semantic and pragmatic predictions about future input derived 
from previous parts of the utterance or dialogue. 

— Expectations exchanged between parallel and autonomous processing struc- 
tures for syntax and semantics. 

The role of dynamic expectations has mostly been investigated from the view- 
point of a possible search space reduction in prediction based parsing strategies 
(namely left or head corner algorithms). If used to select between competing 
hypotheses in speech recognition the predictive capacity of a grammar can con- 
tribute additionally to an enhanced robustness of the overall system jn| |Q . 

Although the importance of predictions for robustness is beyond question, 
here the second type of expectations shall be examined as a matter of prior- 
ity, since they are expected to establish the attempted informational coupling 
between parallel processing units. As the simple examples above have shown, 



no predefined direction for this exchange of information can be assumed. Cer- 
tain syntactic constructions may trigger specific semantic interpretations, a view 
which is strongly supported by the traditional perspective on the relation be- 
tween syntax and semantics. In the opposite direction, semantic relations, e.g. 
derived from background knowledge, can not only be used to disambiguate be- 
tween preestablished syntactic readings, but moreover are able to actively pro- 
pose suitable syntactic structures. This bidirectionality of interaction seems to 
be of great importance for the ability to provide the mutual compensation nec- 
essary to treat deviant constructions of different kind. 

Of course, the expectation-based nature of natural language processing can- 
not guarantee a failure-proof performance under all circumstancesu. There cer- 
tainly are situations in which strong expectations may override even sensory 
data. Such a situation can easily be studied in everyday conversation when- 
ever e.g. pragmatic expectations are predominant. A similar problem occurs in 
experimental settings using intentionally desynchronised video input, where lip 
reading information sometimes overrides even the auditory stimulus. The prob- 
lem is witnessed as well by the difficulties usually encountered in proof-reading 
one's own text: Extremely strong expectations concerning the content usually 
cause minor mistakes to be passed unnoticed. 

Typically, expectations are contradictory and will be of different impact on 
the progress of the analysis procedure. Hence, there is a third principle of ro- 
bust language processing upon which the human model builds. It concerns the 
preference-based selection between both, competing interpretations as well as 
different expectations fj^ ]. Expectations have to be ranked according to their 
particular strength and weighted against each other. 

Recently linguistic research has shown a remarkable trend towards the de- 
velopment of integrated models of language structure. One of the more popular 
examples surely is Head-Driven Phrase Structure Grammar (HPSG [p4f ), where 
syntactic and semantic descriptions are uniquely related to each other by coref- 
erential pointers within the framework of typed feature structures. The strong 
coupling on the level of representation and on the level of processing (i.e. within 
unification) completely lacks autonomy. The construction of a logical form is 
always mediated by syntactic descriptions taken e.g. from subcategorization in- 
formation. Since syntactic and semantic restrictions are conjunctively combined 
the overall vulnerability against arbitrary impairment of the input utterances 
even increases: An analysis may now fail due to syntactic as well as due to 
semantic reasons. 

A quite similar conclusion can be drawn for construction grammar ||, an- 
other integrated approach. It combines syntactic, semantic and even pragmatic 
information in a single representation named construction. Again, autonomy of 
individual description levels is missing and even if constructions are supplied 
with preferential weightings derived from their frequency of use (as realized in 



4 Note that perfect performance is not necessarily covered by the informal notion of 
robustness introduced earlier. 



SAL |LJ) robustness does not increase. 

A clearcut separation of representational levels has actually been realized in 
the cognitively motivated parser COMPERE Jll| jl8) . The system aims at mod- 
elling error recovery techniques for garden-path sentences. It uses an arbitration 
mechanism to decide in case of a conflict situation which alternative reading 
should be backed up. This allows to combine early commitment decisions with 
the possibility to switch to another interpretation if necessary later on. Although 
the parser is guided in its decisions by different kinds of preferences, the map- 
ping between syntactic and semantic representations seems to be a strict one. 
Accordingly, it does not provide the necessary means for conflict resolution in all 
those cases of non-standard input for which no interpretation can be established. 
In particular, three different cases can be distinguished 

1. failure on a single level (syntax or semantics) 

2. failure on both levels (syntax and semantics) 

3. no consistent mapping between levels 

Whereas the first case might be easily accommodated by the arbitration mech- 
anism the latter two require the abandonment of the strict mapping and its 
replacement by a preference-based module interaction. 

3 Disambiguation by Constraint Propagation 

A suitable combination of the three principles discussed above might in fact 
provide the foundation for an effective use of redundancy in parallel processing 
structures 

— autonomy guarantees a fall-back behaviour for failures of a single module 

— expectancy- oriented analysis facilitates the informational exchange and 

— preference-based processing guides the analysis towards a promising interpre- 
tation and establishes a loose coupling between modules. 

These principles, even if taken together, do not explain the almost unconscious 
treatment of errors in everyday communication. To simulate a similar behaviour 
a selective constraint invocation strategy will become necessary. Then, parsing 
is understood as a disambiguation procedure, which activates only specific parts 
of the grammar, if this is deemed to be unavoidable for solving a particular 
disambiguation problem. The procedure can be terminated if a sufficiently reli- 
able disambiguation has been achieved even if certain conditions of the grammar 
have never been checked so far. Robustness is not introduced by a post mortem 
retraction of constraints but rather by their careful invocation. 

Along these lines a rudimentary kind of robustness has been achieved in the 
Constraint Grammar framework |l5| , a system for parsing large amounts of un- 
restricted text. Constraint Grammar (CG) attempts to establish a dependency 
description which is underspecified with respect to the precise identity of modi- 
fiees. Initially, it assigns a set of morphologically justified syntactic labels to each 
word form in the input sentence. Possible labels among others are 



O+FMAINV the finite verb of a clause 

OSUBJ a grammatical subject 

OOBJ a direct object 

@DN> a determiner modifying a noun to the right 

@NN> a noun modifying a noun to the right 

The initial set of labels is successively reduced by applying compatibility 
and surface ordering constraints until a unique interpretation has been reached 
or the set of available constraints is exhausted. In the latter total dis- 

ambiguation cannot be achieved by purely syntactic means, as in the following 
attachment example: 

Bill saw the little dog in the park 

@SUBJ O+FMAINV @DN> @AN> @0BJ @<N0M @DN> @<P 

@<ADVL 

In contrast to traditional grammars of the phrase structure type which li- 
cense well-formed structures according to their rule system, constraint grammar 
rather happens to be an eliminative approacm. Instead of imposing a normative 
description on the input data it takes them as starting point and tries to find a 
plausible interpretation for them. 

This proceeding is motivated by the finding that language is an open-ended 
system and so grammar formalisms based on a "rigid and idealized conception 
of grammatical correctness are bound to leak" @, p. 37]. Par sing, if understood 
as a disambiguation procedure, is put down to the principle of parsimony: The 
more effort is spent the better disambiguation results can be expected and 
p. 39] points to the important psycholinguistic parallel: 

"Mental effort is needed for achieving clarity, precision and maximal 
information. Less efforts imply (retention of) unclarity and ambiguity, 
i.e. information decrease. In several types of parsers, rule applications 
create rather than discard ambiguities: the more processing, the less 
unambiguous information." 

Parsing as disambiguation can well be extended to deal with fully specified de- 
pendency structures without loosing its promising characteristics. A complete 
disambiguation of structural descriptions has first been described for Constraint 
Dependency Grammar (CDG PH ) and simply requires to replace the monadic 
categories of CG by pairs consisting of the relation name and the exactly speci- 
fied modifiee. The mutual compatibility of modifying relations is checked against 
a set of constraints and thus the set of possible modifications is successively 
reduced by a constraint propagation mechanism. Further extensions of the ap- 
proach concern the inclusion of feature descriptions, valency specifications and 
valency saturation conditions g]. 



5 With this respect a strong parallel between the eliminative nature of disambiguation 
and cohort modelling ideas for spoken word recognition 119] becomes visible. 



Though, a closer inspection of the kind of robustness feature introduced by 
the eliminative mode of operation reveals that its nature is quite accidental so 
far. Which types of deviation can be tolerated indeed, strongly depends on the 
rather arbitrary sequence of constraint applications. This shortcoming seems to 
be closely connected to the fact that both formalisms lack the notion of prefer- 
ence so far and therefore do not have the possibility to model the "quality" of 
a constraint]!!. Hence, adding a preference-based selection strategy will be one of 
the most pressing needs for further improvement. Such an extension will be pro- 
posed in section 5. Before we turn to this topic section 4 introduces a modular 
representation schema along the traditional syntax-semantics distinction. It sup- 
ports the desired functional autonomy as well as a highly interactive exchange 
of expectations between the two layers. 



4 Representation Layers 

Whereas Constraint Grammar restricts itself to purely syntactic means, an in- 
tegration of simple semantic criteria into Constraint Dependency Grammar has 
been proposed recently || . It takes into account sortal restrictions only, attach- 
ing them to surface syntactic relations without aiming at modularity and au- 
tonomous behavior. In order to facilitate functional independence it will become 
necessary to establish separate layers for structural description and constraint 
propagation 

— a syntactic layer relating word forms according to functional surface struc- 
ture notions e.g (subject-of, dir-object-of, prep-modifier-of, etc.) 
and using constraints on ordering, agreement, valency, and valency satura- 
tion to select among competing structural configuration and 

— a semantic layer building sentence structures by means of thematic roles (like 
AGENT-OF, instrument-OF, time-OF, etc.) thereby relying upon the argu- 
ment structure of semantic predicates and their corresponding selectional 
restrictions £1 

The following small and rather rigid sample grammar illustrates the different 
types of constraints needed: 

1. licensing conditions for modification relations 

syl: cat(dep(X))=N n 
-> cat(synmod(X))=V A synlab(X)e{SUBJ,OBJ} i 

A noun can modify a verb either as a subject or as a direct object. 

6 CG at least includes heuristic constraints, which may be activated at a particular stage 
of the disambiguation procedure 

7 The proposed separation quite closely corresponds with the one chosen in jll|] 

8 dep(X) refers to the modifier of a relation, syndom(X) and semdom(X) to its modi- 
fiees. synlab(X) and semlab(X) are the respective relation names. cat(X), num(X) , 
semprop(X) , . . . denote properties of the corresponding node. 



2. agreement conditions 

sy2: synlab(X)=SUBJ — > num(dep(X) )=num(syndom(X) ) 

A subject agrees with its modifiee with respect to number. 

3. linear ordering constraints 

sy3: synlab(X)=SUBJ — > pos(dep(X))<pos(syndom(X)) 

The subject precedes the finite verb. 

4. compatibility constraints! 

sy4: syndom(X)=syndom(Y) — > synlab(X) 7^ synlab(Y) A word form 

cannot be modified twice by the same relation. 

syl through sy3 are unary constraints, sy4 is a binary one. Note that constraints 
refer to modifying relations instead of word forms. Therefore they are able to 
express admissibility conditions on local configurations consisting of up to three 
nodes. Note as well that sy3 - even for German main clauses - has a strong 
heuristic appearance and simply states a preference condition which additionally 
requires a suitable exception handling mechanism. 

In a very similar fashion semantic constraints comprise 

1. licensing conditions 

sel: cat(dep(X))=N -» cat(semdom(X))=V A semlab(X) g{AG,PAT} 

2. selectional restrictions 

se2: word(semdom(X) )=f ressen A semprop(mod(X) )=animal 
— > semlab(X)=AG 

Animals do eat. 

se3: word(semdom(X) )=f ressen A semprop(mod(X) )=plant 
-> semlab(X)=PAT 

Plants are to be eaten. 

3. compatibility constraintsEl 

se4: semdom(X)=semdom(Y) — > semlab(X)^semlab(Y) 

Adhering to the principle of autonomy both layers are designed in a way 
which allows them to propagate constraints in a completely independent man- 
ner. Each modifier is specified for two possibly different modifiees and no cross- 
reference between the layers has been used so far. 

In order to finally mediate the interaction between layers, a set of mapping 
constraints has to be provided which sets up bidirectional correspondences 



9 In fact there is another general compatibility constraint implicitly built into the decision 
procedure and excluding ambiguous modifying relations from being consistent: 
sygen: -1 (syndom(X)=syndom(Y) «-> dep(X)=dep(Y) ) 
10 Again, supplemented by a general semantic uniqueness constraint 
segen: -1 (semdom(X)=semdom(Y) <-> dep(X)=dep(Y) ) 



ssl: syndom(X)=semdom(X) — > (synlab(X)=SUBJ semlab=AG) 

The subject of a verb is always identical to its agens. 

ss2: syndom(X)=semdom(X) — *■ (synlab(X)=OBJ semlab=PAT) 

The direct object of a verb is always identical to its patiens. 

It should have become obvious that the selectional restrictions as well as the 
mapping constraints at best can be taken to stand for a preferential interpreta- 
tion. They surely are much to rigid to be sensibly used within a framework of 
strict reasoning. 

Semantic constraints need not be restricted to linguistically motivated (i.e. 
universally valid) ones. In particular, domain-specific restrictions play a crucial 
role in semantic disambiguation and should urgently be incorporated whenever 
possible. Here the semantic layer offers a convenient interface to a knowledge 
representation component which (on demand) can contribute constraints from 
e.g. specialized ontologies, referential instantiations or temporal reasoning. 

5 Weakening Constraints 

So far, one of the most striking shortcomings has been the strictly binary nature 
of constraint satisfaction. Not surprisingly, it turned out to be most inappro- 
priate within the area of semantic modelling where hardly a constraint can be 
formulated without restricting oneself to a particular, preferential reading. 

In what follows, preferences arc not modelled in the usual direct manner 
by emphasizing particular well-formed interpretations but rather indirectly by 
putting a penalty on all remaining alternatives which violate a constraint. For this 
purpose each constraint gets a penalty factor pf assigned reducing the confidence 
score in negative cases. Penalty factors may range from zero to one where 

pf =0 specifies a strict constraint in the classical sense and 

0<pf <1 indicates a soft constraint accepting contradictory cases 
with a confidence value proportional to pf 

Obviously, a value of one is meaningless because it neutralizes the constraint. 
Penalty factors are combined multiplicatively, i.e. compatibility matrices within 
the constraint satisfaction problem no longer contain binary categories but con- 
fidence scores also ranging from zero (for impossible combinations) up to one 
(for combinations not even violating a single constraint). 

The indirect treatment of preference by penalty factors offers a consistent 
extension to the basic paradigm of constraint satisfaction. It does not sacrifice 
the eliminative nature of constraints but simply softens it. Inappropriate readings 
are excluded only if they violate strict constraints. In all other cases they are 
downgraded to a certain degree. 

In particular, the penalty-based approach helps to tackle some normaliza- 
tion problems otherwise inherently connected with the constraint satisfaction 



approach: Most modifying relations (or combinations of them) will pass a con- 
straint simply because it is irrelevant for that particular configuration. An in- 
crease of goodness estimates for these cases would yield a highly undesirable, 
since unjustified reinforcement. 

By assigning the penalty factors pf (syl)=pf (sy4)=0 to the constraints syl 
and sy4 from section 3 both are declared to be strict ones, a fact obviously 
being valid for the toy-size sample grammar which does not take into account 
coordinative structures. Using pf (sy2)=0. 1 the agreement condition is treated 
as a rather strong one which allows exceptions only occasionally, pf (sy3)=0.3 
on the other hand results in a much more permissive constraint justified by the 
fact that sy2 is meant to exclude ungrammatical utterances but sy3 only to 
disfavour a marked ordering. 

On the semantic layer only the licensing constraint sel is declared as a strict 
one. The compatibility constraint se4 is weakened considerably in order to ac- 
count for double modification as in the case of anaphoric reference. The two 
selectional constraints receive penalty factors of different strength in order to 
model the lower probability of a plant eating something (se2) compared to an 
animal being eaten (se3): 

pf (sel)=0.0 
pf (se2)=0.1 
pf (se3)=0.7 
pf (se4)=0.5 

The mapping constraints, finally, are weighted in a way which strongly favours 
the SUBJECT-AGENS and OBJECT-patiens pairings nevertheless allowing alter- 
native interpretations e.g. in passive sentences. 

pf (ssl)=0.2 
pf (ss2)=0.3 

Alternative readings can but need not be specified explicitly. In more realistic 
applications though it is recommended to aim at a considerably richer modelling, 
otherwise an unbalanced penalty factor as between se2 and se3 may create a 
sometimes undesired strong bias by default: If both constraints appear to be of 
no relevance the patience reading is clearly prioritized. In the example chosen 
this corresponds to the acceptable interpretation that an arbitrary thing is more 
likely to be eaten than to eat. 

After having introduced penalty factors as a means of modelling preferences 
constraint propagation can be extended from the classical case of strictly bi- 
nary decisions to the handling of confidence scores. The application of penalty- 
weighted constraints to a disambiguation problem now consists of two steps: 

1. the calculation of initial confidence scores for all combinations of syntactic 
and semantic modification relations and 

2. a selection procedure pruning the search space by sorting out unlikely inter- 
pretations 

The selection procedure is based on a local assessment function heuristically 
identifying relations to be pruned. In order to not select promising hypothe- 



ses assessment first of all should only take into account modification relations 
characterized by the following three criteria 



— being close to the global minimum for all modification relations, combined 
with 

— an as possible as high contrast to alternative relations and 

— a low contrast between all the confidence scores supporting the relation in 
question. 

For experimental purposes a selection procedure based on the sum of quadratic 
errors for setting scores to zero has been used. Hence, structural interpretations 
violating a high number of rather strong constraints are pruned first. 

Using the toy grammar specified above together with its penalty scores the 
arbitration process between syntactic and semantic evidence in simple disam- 
biguation problems can be studied. Thus in a sentence like 

Pferde fressen Gras. (Horses eat grass.) (2a) 

both layers uniformly support a single interpretation 

AG PAT 



Pferde fressen Gras 

Due to the strong semantic support the interpretation remains unchanged if a 
marked ordering (topicalization of the direct object) is chosen (2b), an agree- 
ment error is introduced (2c) and both deviations are combined finally (2d). 

Grasp at fressen PferdeAG- (2b) 
PferdAG fressen Grasp at- (2c) 
Grasp at fressen PferdAG- (2d) 

The interpretation is retained even if its semantic support is neutralized as in 

the following utterance, containing a twofold type shift. 

Autos ag fressen Geld P AT- (Cars eat money.) (3a) 

GeldpAT fressen Autos ag- (3b) 
Auto ag fressen GeldpAT- (3c) 
It switches to the alternative interpretation only in the case of combined syn- 
tactic distortions 

GeldAG fressen AutopAT- (3d) 
Even for the counterintuitive example 

GraserAG fressen PferdpAT- (4a) 
which, if desired, could be taken as a headline-style utterance, syntactic evidence 
will gain the upper hand against the violation of two selectional constraints. This 



interpretation, however, happens to be a rather fragile one and breaks immedi- 
ately under arbitrary syntactic variation. 

Since the selection procedure operates on a global assessment of local struc- 
tural configurations it cannot guarantee to find an optimal and globally consis- 
tent interpretation. The partially local mode of operation, on the other hand, 
can be expected to provide a quite natural explanation for human garden-path 
phenomena. Within the framework of preference-based disambiguation they turn 
out to be a special case of contradictory situations which manifest themselves as 
expectation violations: The consequences of a pruning decision may not coincide 
with local confidence estimations elsewhere in the constraint network. 

Expectation violations not necessarily do indicate an erroneous situation. 
They are frequently used as a speaker's intentionally chosen means to attract 
the attention of the audience. This happens for instance by deviating from an 
unmarked ordering to emphasize a topicalized constituent (c.f. (3b)) or by oth- 
erwise producing unexpected utterances. 

On the other hand there are the typical erroneous situations which in case of 

— internal difficulties (e.g. due to early commitment strategies in garden path 
situations) might offer the possibility to initiate a reanalysis and 

— external reasons (e.g. ill-formed input) can be used to track down the error 
to find a possible remedy for it. 

Note that in the latter case the situation coincides with basic observations for 
the human model: Finding an interpretation for erroneous utterances will be 
easier - in terms of effort to spent - than detecting the error, which in turn will 
be less demanding than localizing or even correcting it. 

As with human language processing there will be no predefined direction for 
the general flow of expectations during arbitration. Whether syntactic evidence 
is propagated from the syntactic to the semantic layer or vice versa depends 
only on the available information. This seems to be in accordance with recent 
psycholinguistic findings which contest the existence of purely structural disam- 
biguation principles • 

6 Preference-based Reasoning 

Eliminating implausible interpretations by locally pruning less favoured modi- 
fication relations represents only one, though fundamental method for the dis- 
ambiguation of natural language utterances. By selecting among modifying rela- 
tions according to negative evidence from maximally dispreferred hypotheses, the 
technique fits quite well into the constraint satisfaction approach and achieves 
its robust behaviour by avoiding extremely risky decisions on a locally topmost 
reading. Taking this as a starting point the basic way of reasoning can well be 
complemented by a second propagation principle based on preference-induced 
constraints. These are activated only in situations where enough positive evi- 
dence can be derived from almost uniquely determined preferences. Since the 
existence of convincing preferences in realistic disambiguation tasks represents 



rather the exception than the rule the nature of this propagation principle is 
secondary. 

Preference-induced constraints consist of implications P —* p C which, given 
enough evidence for the unary precondition P, require the possibly binary con- 
straint C to hold. Constraints of this type can be used to model e.g. the higher- 
order conditions which in many mapping situations involve more than two rela- 
tions. 

pssl: word(syndom(X))=im A synlab(X)=PHEAD 
A semprop(dep(X))e{TEMP,LOC} 
-> p synlab(Y)=PMOD A dep(Y)=syndom(X) 

A semdom(Z)=sydom(X) — » semlab(Z)=PART-OF 

A prepositional phrase headed by the word form "im" fills a semantic 
PART-OF slot 

In postnominal positions like 

Dann nehmen wir die erste Woche im Mai. (5) 
(Let's take the first week in may.) 

this constraint puts a preference on the lower attachment since a part-of relation 
usually is not licensed as an argument position for verbs. 

Preference-induced constraints can also be used to modify value assignments 
at certain nodes in the constraint network without the necessity to copy them. 
By applying this technique, phrasal feature projections can be modelled in order 
to build up descriptions for partial dependency trees 

pss2: synlab(X)=DET 

— > p case(syndom(X)) : =case (syndom(X) ) n case(dep(X)) 

A noun group carries the intersection of possible case features 
found at its members. 

Preference-induced constraints introduce a kind of inhibitory mechanism to the 
disambiguation procedure: Already preferred interpretations is given the chance 
to propagate their consequences over the network thus possibly leading to further 
suppression of alternative readings. 

7 Conclusion 

Combining the climinative nature of a disambiguation procedure with a system 
architecture supporting bidirectional arbitration between syntactic and semantic 
evidence has turned out to be a key factor for achieving a higher level of robust- 
ness in language understanding. While the disambiguation paradigm provided 
the basic fall-back behaviour (an arbitrary utterance will get a description as- 
signed) and the possibility to prune the search space towards a least disfavoured 
reading, the parallel arrangement of modules allows to interactively exchange 
expectations and thus bypassing local interpretation difficulties. By modelling 
a preference distribution based on penalty factors, the desired robust behaviour 
can be demonstrated at least for very simple sample utterances. Although no 



conclusive judgement about the feasibility of the approach can be given until 
the experimental setting has been scaled up to a fairly realistic problem size, 
a remarkable qualitative advance over comparable approaches becomes evident 
even on this elementary level: 

— The approach departs from a predefined sequential arrangement of modules 
in favour of a strictly symmetrical architecture consisting of autonomous 
components for syntax and semantics. 

— It allows to treat syntactic ill-formedness and semantic deviations by pro- 
viding a mechanism for mutual compensation. Syntactically anomalous ut- 
terances can be understood as long as there is enough semantic and/or 
pragmatic evidence. In order to communicate novel or unusual content a 
sufficiently high degree of syntactic support is required. 

— Insufficient modelling information on any one of the processing layers might 
well result in the selection of an odd interpretation but will not cause the 
language processing unit to break down entirely 

— Robustness is not an add-on feature of an otherwise temperamental proce- 
dure but falls out from the basic properties of the processing mechanism. 

Since structural disambiguation by constraint satisfaction likewise lends itself to 
the creation of time sensitive parsing procedures |2^| , in the long run it might 
provide a unifying foundation to build language processing systems upon which 
embody aspects of robustness against such different disruptive factors as syntac- 
tically ill-formed input, metaphorical use and dynamic time constraints. 
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